Segmentation of Automatically Transcribed Broadcast News Text

نویسندگان

  • P. van Mulbregt
  • I. Carp
  • L. Gillick
  • S. Lowe
  • J. Yamron
چکیده

Expertise in the automatic transcription of broadcast speech has progressed to the point of being able to use the resulting transcripts for information retrieval purposes. In this paper, we describe the Segmentation system used by Dragon Systems in the Segmentation task of the 1998 TDT evaluation, highlighting improvements made since the September 1998 dryrun. Segmentation of closed-caption and human transcripts of news is contrasted with the results of segmenting the ASR transcripts. This will be followed by a discussion of the metric, in particular how the value of the segmentation metric relates to the value of the tracking metric, when the these latter two tasks are performed on automatically segmented ASR text, rather than the ASR text with correctly marked boundaries.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Segmentation and Indexation of Broadcast News

This paper describes a topic segmentation and indexation system for broadcast news that is integrated in an alert system for selective dissemination of multimedia information. The goal of this work is to enhance the retrieval and navigation through specific spoken audio segments that have been automatically transcribed, using speech recognition. Our segmentation algorithm is based on simple heu...

متن کامل

Indexing Broadcast News

This paper describes a topic segmentation and indexation system for broadcast news that is integrated in an alert system for selective dissemination of multimedia information. The goal of this work is to enhance the retrieval and navigation through specific spoken audio segments (stories) that have been automatically transcribed, using speech recognition. Our segmentation algorithm is based on ...

متن کامل

Topic Indexing of TV Broadcast News Programs

This paper describes a topic segmentation and indexation system for TV broadcast news programs spoken in European Portuguese. The system is integrated in an alert system for selective dissemination of multimedia information developed in the scope of an European Project. The goal of this work is to enhance the retrieval of specific spoken documents that have been automatically transcribed, using...

متن کامل

A Scalable Video Search Engine Based on Audio Content Indexing and Topic Segmentation

One important class of online videos is that of news broadcasts. Most news organisations provide near-immediate access to topical news broadcasts over the Internet, through RSS streams or podcasts. Until lately, technology has not made it possible for a user to automatically go to the smaller parts, within a longer broadcast, that might interest them. Recent advances in both speech recognition ...

متن کامل

Network of Data Centres (NetDC): BNSC - An Arabic Broadcast News Speech Corpus

Broadcast news is a very rich source of Language Resources that has been exploited to develop and assess a large set of Human Language Technologies. Some examples include systems to: automatically produce text transcriptions of spoken data; identify the language of a text; translate a text from one language to another; identify topics in the news and retrieve all stories discussing a target top...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999